Homework 2 - Michał Gromadzki

Importing libraries

Loading dataset

EDA and Preprocessing

Checking for nulls.

No nulls.

Encoding categorical features.

Checking correlation.

A strong correlation is observed only with smoking

Models

LinearRegression

Forest

Homework

Selecting observation

Calculating the model prediction decomposition using LIME

Creating LIME explainer

LinearRegression

RandomForest

Comparing LIME decomposition for different observations in the set

LinearRegression

Observation #1

Observation #2

Observation #3

RandomForest

Observation #1

Observation #2

Observation #3

The explanations seem to be stable. Not being a smoker always decreases the charges, no matter what the age is. Moreover lower age decreases the predicted crages and medium BMI have little to no impact on the prediction, however these features have much less of an impact on the prediction the smoking.

Comment

  1. Random forest seems to be more accurate, but it is hard to draw conclusion from one example.
  1. From explenations we can see that the biggest difference between the models is how the prediction changes based on the BMI value. The RandomForest model increases the predicted charges more then LinearRegression based on the same BMI value.
  1. In all 3 cases RandomForest model is more accuare than the LinearRegression. RandomForest seems to put more waith on other features then smoking in comparison with LinearRegression. Patterns seen in above points are also visable in these examples. Low age and bmi reduces the predicted charges. The same rule applies to the number of childrean. Being a smoker obviously massively increases the predicted charges.